智能论文笔记

Classification of Histopathology Images of Lung Cancer Using Convolutional Neural Network (CNN)

Neha Baranwal , Preethi Doravari , Renu Kachhoria

分类：计算机视觉

2021-12-27

癌症是人体内部异常细胞的无法控制的细胞分裂，可以蔓延到其他身体器官。它是非传染性疾病（NCDS）和NCDS之一，占全世界总死亡人数的71％，而肺癌是女性乳腺癌后第二次诊断的癌症。肺癌的癌症生存率仅为19％。有各种方法用于诊断肺癌，如X射线，CT扫描，PET-CT扫描，支气管镜检查和活组织检查。然而，为了了解基于组织型H和E染色的肺癌亚型，广泛使用，其中染色在从活组织检查中吸入的组织上进行。研究报道，组织学类型与肺癌预后和治疗相关。因此，早期和准确地检测肺癌组织学是一种迫切需要，并且由于其治疗取决于疾病的组织学，分子曲线和阶段的类型，最重要的是分析肺癌的组织病理学图像。因此，为了加快肺癌诊断的重要过程，减少病理学家的负担，使用深层学习技术。这些技术表明了在分析癌症组织病变幻灯片的分析中提高了疗效。几项研究报告说，卷积神经网络（CNN）在脑，皮肤，乳腺癌，肺癌等各种癌症类型的组织病理学图片的分类中的重要性。在本研究中，通过使用Reset50，VGG-19，Inception_Resnet_V2和DenSenet进行特征提取和三重态丢失来引导CNN以引导CNN，以引导CNN，以引导CNN使得其增加群集间距离并减少集群内距离。

translated by 谷歌翻译

Understanding BLOOM: An empirical study on diverse NLP tasks

Parag Pravin Dakle , SaiKrishna Rallabandi , Preethi Raghavan

分类：自然语言处理

2022-11-27

In this work, we present an evaluation of smaller BLOOM model variants (350m/560m and 1b3/1b7) on various natural language processing tasks. This includes GLUE - language understanding, prompt-based zero-shot and few-shot text classification and extraction, question answering, prompt-based text generation, and multi-lingual text classification to understand model strengths/weaknesses and behavior. Empirical results show that BLOOM variants under-perform on all GLUE tasks (except WNLI), question-answering, and text generation. The variants bloom for WNLI, with an accuracy of 56.3%, and for prompt-based few-shot text extraction on MIT Movies and ATIS datasets. The BLOOM variants on average have 7% greater accuracy over GPT-2 and GPT-Neo models on Director and Airline Name extraction from MIT Movies and ATIS datasets, respectively.

translated by 谷歌翻译

Towards Zero-Shot Code-Switched Speech Recognition

Brian Yan , Matthew Wiesner , Ondrej Klejch , Preethi Jyothi , Shinji Watanabe

分类：自然语言处理

2022-11-02

In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot setting where no transcribed CS speech data is available for training. Previously proposed frameworks which conditionally factorize the bilingual task into its constituent monolingual parts are a promising starting point for leveraging monolingual data efficiently. However, these methods require the monolingual modules to perform language segmentation. That is, each monolingual module has to simultaneously detect CS points and transcribe speech segments of one language while ignoring those of other languages -- not a trivial task. We propose to simplify each monolingual module by allowing them to transcribe all speech segments indiscriminately with a monolingual script (i.e. transliteration). This simple modification passes the responsibility of CS point detection to subsequent bilingual modules which determine the final output by considering multiple monolingual transliterations along with external language model information. We apply this transliteration-based approach in an end-to-end differentiable neural network and demonstrate its efficacy for zero-shot CS ASR on Mandarin-English SEAME test sets.

translated by 谷歌翻译

Investigating Modality Bias in Audio Visual Video Parsing

Piyush Singh Pasi , Shubham Nemani , Preethi Jyothi , Ganesh Ramakrishnan

分类：计算机视觉

2022-03-31

We focus on the audio-visual video parsing (AVVP) problem that involves detecting audio and visual event labels with temporal boundaries. The task is especially challenging since it is weakly supervised with only event labels available as a bag of labels for each video. An existing state-of-the-art model for AVVP uses a hybrid attention network (HAN) to generate cross-modal features for both audio and visual modalities, and an attentive pooling module that aggregates predicted audio and visual segment-level event probabilities to yield video-level event probabilities. We provide a detailed analysis of modality bias in the existing HAN architecture, where a modality is completely ignored during prediction. We also propose a variant of feature aggregation in HAN that leads to an absolute gain in F-scores of about 2% and 1.6% for visual and audio-visual events at both segment-level and event-level, in comparison to the existing HAN model.

translated by 谷歌翻译

Error Correction in ASR using Sequence-to-Sequence Models

Samrat Dutta , Shreyansh Jain , Ayush Maheshwari , Souvik Pal , Ganesh Ramakrishnan , Preethi Jyothi

分类：自然语言处理 | 机器学习

2022-02-02

自动语音识别（ASR）中编辑的后编辑需要自动纠正ASR系统产生的常见和系统错误。 ASR系统的输出在很大程度上容易出现语音和拼写错误。在本文中，我们建议使用强大的预训练的序列模型BART，BART进一步适应训练以作为剥夺模型，以纠正此类类型的错误。自适应培训是在通过合成诱导错误以及通过合并现有ASR系统中的实际错误获得的增强数据集上执行的。我们还提出了一种简单的方法，可以使用单词级别对齐来恢复输出。对重音语音数据的实验结果表明，我们的策略有效地纠正了大量的ASR错误，并在与竞争性基线相比时会产生改善的结果。我们还强调了在印地语语言中相关的语法误差校正任务中获得的负面结果，显示了通过我们建议的模型捕获更广泛上下文的限制。

translated by 谷歌翻译